20 research outputs found

    Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction

    Full text link
    In VP9 video codec, the sizes of blocks are decided during encoding by recursively partitioning 64Ă—\times64 superblocks using rate-distortion optimization (RDO). This process is computationally intensive because of the combinatorial search space of possible partitions of a superblock. Here, we propose a deep learning based alternative framework to predict the intra-mode superblock partitions in the form of a four-level partition tree, using a hierarchical fully convolutional network (H-FCN). We created a large database of VP9 superblocks and the corresponding partitions to train an H-FCN model, which was subsequently integrated with the VP9 encoder to reduce the intra-mode encoding time. The experimental results establish that our approach speeds up intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in the Bjontegaard-Delta bitrate (BD-rate). While VP9 provides several built-in speed levels which are designed to provide faster encoding at the expense of decreased rate-distortion performance, we find that our model is able to outperform the fastest recommended speed level of the reference VP9 encoder for the good quality intra encoding configuration, in terms of both speedup and BD-rate

    Multiple Description Coding of Visual Information

    Get PDF
    Nowadays, image and video compression are well developed fields of signal processing. Modern state-of-the-art coders allow better compression with better quality. A new field in signal processing is representation of 3D scenes. 3D visual scenes may be captured by stereoscopic or multi-view camera settings. The captured multi-view video can be compressed directly or converted to more abstract 3D data representations such as 3D dynamic meshes (mesh sequences) and efficiently compressed. In any case, efficiently compressed visual data has to be transmitted over communication channels, such as wireless channels or best-effort networks. This raises the problem of error protection, since most of these channels are error-prone.A common approach to error protection is to consider it as a pure channel problem, separate from the source compression problem. This approach is based on Shannon's work, which states that in principle source and channel coding tasks can be carried out independently with no loss of efficiency. However, this cannot be achieved in practice due to delay requirements and other problems. As an alternative, one can tolerate channel losses. Assuming that not all the data sent has reached the decoder, one can concentrate on ensuring efficient decoding of the correctly received data only. One way to achieve this is to use multiple description coding (MDC). The source is encoded into several descriptions, which are sent to the decoder independently over different channels. The decoder can reconstruct the source with lower yet acceptable quality from any description received. Better reconstruction quality is obtained from more descriptions.This thesis investigates MDC of images, video, stereoscopic video, and 3D meshes thus validating MDC as an error resilience tool for various types of multimedia data. The thesis consists of four main chapters.Chapter 2 deals with MDC of images. It introduces an MDC algorithm based on a 2-stage compression scheme employing B-spline-based image resizing which is used to split the image into the coarse and residual parts. The coarse part is included in both descriptions while the residual part is split into two parts. A bit allocation algorithm optimizes the scheme for a given bit budget and probability of description loss.Chapter 3 addresses MDC of video. It presents a 3D-transform-based MD video coder targeted for mobile devices. The encoder has low computational complexity and the compressed video is robust to transmission errors. The chapter estimates the scheme's encoding complexity and introduces an optimization procedure which minimizes the expected reconstruction distortion subject to the packet loss rate.In Chapter 4, MDC of stereoscopic video is addressed. Two MDC schemes are introduced, one based on spatial scaling and another based on temporal subsampling. A switching criterion makes it possible to decide, which scheme is more advantageous for the sequence being encoded.Chapter 5 discusses MDC of 3D meshes. It introduces two MDC approaches for coding highly detailed 3D meshes. The schemes are able to produce multiple descriptions and are easily adaptable to changing packet loss rate and bit budget. The proposed D-R curve modeling significantly decreases computational load at the preparatory stage

    Joint source-channel coding for error resilient transmission of static 3D models

    No full text
    In this paper, performance analysis of joint source-channel coding techniques for error-resilient transmission of three dimensional (3D) models are presented. In particular, packet based transmission scenarios are analyzed. The packet loss resilient methods are classified into two groups according to progressive compression schemes employed: Compressed Progressive Meshes (CPM) based methods and wavelet based methods. In the first group, layers of CPM algorithm are protected unequally by Forward Error Correction (FEC) using Reed Solomon (RS) codes. In the second group, embedded bitstream obtained from wavelet based coding is protected unequally with FEC as well. Both groups of methods are scalable with respect to both channel bandwidth and packet loss rate, i.e. they try to optimize FEC assignments with respect to channel bandwidth and packet loss rates (PLR). In-depth analysis of these techniques are carried out in terms of complexity, robustness to losses and compression efficiency. Experimental results show that wavelet based methods achieve considerably better quality compared to CPM based methods

    Efficient Per-Shot Convex Hull Prediction By Recurrent Learning

    Full text link
    Adaptive video streaming relies on the construction of efficient bitrate ladders to deliver the best possible visual quality to viewers under bandwidth constraints. The traditional method of content dependent bitrate ladder selection requires a video shot to be pre-encoded with multiple encoding parameters to find the optimal operating points given by the convex hull of the resulting rate-quality curves. However, this pre-encoding step is equivalent to an exhaustive search process over the space of possible encoding parameters, which causes significant overhead in terms of both computation and time expenditure. To reduce this overhead, we propose a deep learning based method of content aware convex hull prediction. We employ a recurrent convolutional network (RCN) to implicitly analyze the spatiotemporal complexity of video shots in order to predict their convex hulls. A two-step transfer learning scheme is adopted to train our proposed RCN-Hull model, which ensures sufficient content diversity to analyze scene complexity, while also making it possible capture the scene statistics of pristine source videos. Our experimental results reveal that our proposed model yields better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches. On average, the pre-encoding time was reduced by 58.0% by our method, while the average Bjontegaard delta bitrate (BD-rate) of the predicted convex hulls against ground truth was 0.08%, while the mean absolute deviation of the BD-rate distribution was 0.44

    Low-Complexity Multiple Description Coding of Video Based on 3D Block Transforms

    Get PDF
    The paper presents a multiple description (MD) video coder based on three-dimensional (3D) transforms. Two balanced descriptions are created from a video sequence. In the encoder, video sequence is represented in a form of coarse sequence approximation (shaper) included in both descriptions and residual sequence (details) which is split between two descriptions. The shaper is obtained by block-wise pruned 3D-DCT. The residual sequence is coded by 3D-DCT or hybrid, LOT+DCT, 3D-transform. The coding scheme is targeted to mobile devices. It has low computational complexity and improved robustness of transmission over unreliable networks. The coder is able to work at very low redundancies. The coding scheme is simple, yet it outperforms some MD coders based on motion-compensated prediction, especially in the low-redundancy region. The margin is up to 3 dB for reconstruction from one description

    Akar, “Packet loss resilient transmission of 3D models

    No full text
    This paper presents an efficient joint source-channel coding scheme based on forward error correction (FEC) for three dimensional (3D) models. The system employs a wavelet based zero-tree 3D mesh coder based on Progressive Geometry Compression (PGC). Reed-Solomon (RS) codes are applied to the embedded output bitstream to add resiliency to packet losses. Two-state Markovian channel model is employed to model packet losses. The proposed method applies approximately optimal and unequal FEC across packets. Therefore the scheme is scalable to varying network bandwidth and packet loss rates (PLR). In addition, Distortion-Rate (D-R) curve is modeled to decrease the computational complexity. Experimental results show that the proposed method achieves considerably better expected quality compared to previous packet-loss resilient schemes. Index Terms — Visual communications, error correction, computer vision, multidimensional systems, wavelet transform, networks
    corecore